Minimization and estimation of the variance of prediction errors for cross-validation designs
نویسندگان
چکیده
We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice.
منابع مشابه
Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation
BACKGROUND Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging - especially under model uncertainty - and requires independent test objects. These test objects must...
متن کاملEvaluation of co-kriging different methods for rainfall estimation in arid region (Central Kavir basin in Iran)
Rainfall is considered a highly valuable climatologic resource, particularly in arid regions. As one of the primaryinputs that drive watershed dynamics, rainfall has been shown to be crucial for accurate distributed hydrologicmodeling. Precipitation is known only at certain locations; interpolation procedures are needed to predict this variablein other regions. In this study, the ordinary cokri...
متن کاملSpatial prediction of soil electrical conductivity using soil axillary data, soft data derived from general linear model and error measurement
Indirect measurement of soil electrical conductivity (EC) has become a major data source in spatial/temporal monitoring of soil salinity. However, in many cases, the weak correlation between direct and indirect measurement of EC has reduced the accuracy and performance of the predicted maps. The objective of this research was to estimate soil EC based on a general linear model via using se...
متن کاملImproving the clay, silt and sand of soil prediction by removing the influence of moisture on reflectance using EPO
Moisture is one of the most important factors that affects soil reflectance spectra. Time and spatial variability of soil moisture leads to reducing the capability of spectroscopy in soil properties estimation. Developing a method that could lessen the effect of moisture on soil properly prediction using spectrometry, is necessary. This paper utilises an external parameter orthogonalisation (EP...
متن کاملLarge-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation
In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016